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ABSTRACT 


The increasing availability of data from multi-site randomized trials provides a 
potential opportunity to use instrumental variables methods to study the effects of 
multiple hypothesized mediators of the effect of a treatment. We derive nine assumptions 
needed to identify the effects of multiple mediators when using site-by-treatment 
interactions to generate multiple instruments. Three of these assumptions are unique to 
the multiple-site, multiple-mediator case: 1) the assumption that the mediators act in 
parallel (no mediator affects another mediator); 2) the assumption that the site-average 
effect of the treatment on each mediator is independent of the site-average effect of each 
mediator on the outcome; and 3) the assumption that the site-by-compliance matrix has 
sufficient rank. The first two of these assumptions are non-trivial and cannot be 
empirically verified, suggesting that multiple-site, multiple-mediator instrumental 
variables models must be justified by strong theory. 



1. INTRODUCTION 


In canonical applications of the instrumental variable method, exogenously 
determined exposure to an instrument induces exposure to a treatment condition which in 
turn causes a change in a later outcome. A crucial assumption known as the exclusion 
restriction is that the hypothesized instrument can influence the outcome only through its 
influence on exposure to the treatment of interest (Heckman & Robb, 1985b; Imbens & 
Angrist, 1994). It may be the case, however, that an instrument affects the outcome 
through multiple treatments, in which case a single instrument will not suffice to identify 
the causal effects of interest. 

To cope with this problem, analysts have recently exploited the fact that a causal 
process is often replicated across multiple sites, generating the possibility of multiple 
instruments in the form of site-by-instrument interactions. These multiple instruments 
can, in principle, enable the investigator to identify the impact of multiple processes 
regarded as the mediators of the effect of an instrument. Kling, Liebman, and Katz (2007), 
for example, used random assignment in the Moving to Opportunity (MTO) study as an 
instrument to estimate the impact of neighborhood poverty on health, social behavior, 
education, and economic self-sufficiency of adolescents and adults. Reasoning that the 
instrument might affect outcomes through mechanisms other than neighborhood poverty, 
they control for a second mediator, use of the randomized treatment voucher. To do so, 
they capitalize on the replication of the MTO experiment in five cities, generating ten 1 
instruments ("site-by-randomization interactions”) to identify the impact of the two 

1 The five sites generate ten site-by-treatment interactions as instruments because there were three 
(randomly assigned) treatment conditions per site. 
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mediators of interest, neighborhood poverty and experimental compliance. Using a similar 
strategy, Duncan, Morris, and Rodrigues (2011) use data from sixteen implementations of 
welfare-to-work experiments to identify the impact of family income, average hours 
worked, and receipt of welfare as mediators. 

Clearly, this strategy for generating multiple instruments has potentially great 
appeal in research on causal effects in social science. For example, Spybrook (2008) found 
that, among 75 large-scale experiments funded by the US Institute of Education Sciences 
over the past decade, the majority were multi-site trials in which randomization occurred 
within sites. In principle, these data could yield a wealth of new knowledge about causal 
effects in education policy. It is essential, however, that researchers understand the 
assumptions required to pursue this strategy successfully. To date, we know of no 
complete account of these assumptions. 

Our purpose therefore is to clarify the assumptions that must be met if this 
"multiple-site, multiple-mediator" instrumental variables strategy (hereafter MSMM-IV) is 
to identify the average causal effects (ATE) in the populations of interest. For simplicity of 
exposition, and corresponding to the applications of MSMM-IV to date, we consider the 
case of where a single instrument (which we denote as T) operates through a set of 
mediators M = (M 1; M 2 , ... , M P ), that are linearly related to an outcome Y. We conclude 
that, in addition to the assumptions typically required in the single-site, single-instrument, 
single-mediator case, three additional assumptions are required in the MSMM-IV case. 

We begin by delineating the assumptions required for identification in the case of a 
single instrument and a single mediator within a single-site study. We describe the 
assumptions needed to identify the "local average treatment effect" (LATE) described by 
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Angrist, Imbens, and Rubin (1996) and the (slightly different) assumptions needed to 
identify the average treatment effect (ATE) among the population. Additionally, we 
consider the general case where both the instrument and the mediator may be continuous 
or multi-valued. 

Following a discussion of the single, site, single mediator case, we then turn our 
attention to the case of primary interest: the MSMM-IV design. We specify a set of nine 
assumptions required for the MSMM-IV model to identify the average treatment effects of 
the mediators, three of which are specific to the MSMM-IV case, and which we discuss in 
some detail. 


2. THE SINGLE-SITE, SINGLE-MEDIATOR CASE 
Notation 

Suppose that each participant in a single-site study is exposed to a treatment T 
taking on values in the domain Tci. We hypothesize that T may affect some outcome Y 
through its effect on some mediator M. Thus, in our notation, T is an instrument that will 
be used to identify the effect of some mediator M. We often consider treatments taking on 
values in the domain T = (0,1), where T — 1 if the participant is assigned to the 
"treatment" condition or T — 0 if she is assigned to the alternative "control" condition. 
Likewise, we often consider mediators taking on values in the domain M = (0,1), where 
M = 1 if the individual experiences the mediator condition and M — 0 if she does not. 
More generally, however, both T and M may be multi-valued or continuous. 

Note that our terminology and notation differ here from those in standard 
econometric discussions of instrumental variables. In the econometric tradition, an 
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instrument Z is used to identify the effect of a treatment T on an outcome Y. In this 
tradition, the reduced form effect of Z on Y is often not of substantive interest; rather, Z is 
of interest to the econometrician largely because it may be "instrumental" in identifying the 
effect of T on Y. In our terminology, however, assignment to a treatment T (such an 
intervention or policy condition) is used as an instrument to identify the effect of mediator 
M on an outcome Y. Our terminology derives from the program evaluation tradition, in 
which both the reduced-form effect of T on Y and the effect(s) of the mediator(s) through 
which T may operate are of interest. Throughout the remainder of this paper, we shall use 
T to denote a treatment assignment condition that is used as an instrument, and we shall 
use M to denote an experienced mediator condition. 

Figure 1 summarizes our notation. We refer to the effect of T on M as the 
"compliance"; the person-specific compliance is denoted T; the average compliance in the 
population is y — £ [T]. Similarly, the person-specific effect of the mediator M on the 
outcome Y is denoted as A; the average effect of M on Y in the population (often the 
estimand of interest) is denoted as 8 = E [A]. Finally, we denote the person-specific effect 
of T on Y as B; the average effect of T on Y in a the population (often referred to as the 
"intent-to-treat" effect in the program evaluation literature, or the "reduced form" effect in 
the econometrics literature) is therefore (3 = E[B\. 

Figure 1: 
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Identifying Assumptions 


In order to define a set of causal estimands of interest, we first require the 
assumption that an individual’s potential outcomes depend only on the treatment 
condition and mediator condition to which that particular individual is exposed (and not on 
the treatment and mediator conditions of others), known as the Stable Unit Treatment 
Value Assumption (SUTVA) (Rubin, 1986). In the standard potential outcomes framework, 
we typically require a single SUTVA assumption stating that one individual’s potential 
outcomes do not depend on others’ treatment status. In the IV model, however, the 
presence of three variables of interest — the treatment T, a mediator M, and an outcome 
Y — necessitates a pair of such assumptions (Angrist, et al., 1996), stated formally below. 

Assumption (i): Stable unit treatment value assumptions (SUTVA): 

(i.a) Each unit i has one and only one potential value of the mediator M for each 

treatment condition t: in particular, for a population of size N, t 2 , ... , t N ) = 

mj(tj) for all i E (1,2, ... , N}. 

(i.b) Each unit i has one and only one potential outcome value of Y for each pair of 
values of treatment condition t and mediator value m: in particular, for a 
population of size N, y t (t 1; t 2 , ... ,t N ,m 1 ,m 2 , ... ,m N ) = yi(.t i ,m i ) for all 
i e (1,2 N}. 

Given the SUTVA assumptions, we can represent the potential outcome Y for a participant 
who experiences treatment t and mediator value m(t ) as y(t, m(t)) (we drop the subscript 
i throughout the remainder of this paper except when necessary for clarity). 
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Our second assumption is that T affects Y only through its impact on the mediator 
M. This is the standard exclusion restriction assumption: 

Assumption (ii): Exclusion restriction-. 

y(0 - y(t,m(t)) = y(m(t)). 

The exclusion restriction combined with the second SUTVA assumption (i.b) implies a third 
SUTVA condition: (i.c) Each unit i has one and only one potential outcome value of Y for 
each value of the mediator m: in particular, for a population of size N, (m 1 , m 2 , ... , m N ) = 
yj(mj) for all i G {1,2, ... , N}. 

The SUTVA assumptions are necessary in order to define the causal estimands of 
interest. If the treatment variable is binary, for example, the first SUTVA assumption (i.a) 
implies that we can define the person-specific casual effect of the treatment on M as 
T = m(l) — m(0). If, however, the treatment is not binary, it will be useful to assume that 
the person-specific effect of T on M is linear in T, in which case T — m(t ) — m(t — 1): 

Assumption (iii): Person-specific linearity of the mediator M in T : The person-specific effect 
of T on mediator M is linear. That is, m(t) = m(0) + tT. 

Likewise, it will be useful to assume that the person-specific effect of M on Y is 
linear in M. This is a standard, if not unproblematic, assumption in IV models. In this case, 
the third SUTVA condition (i.c) implies that we can define the person-specific casual effect 
of the mediator Vas A = y(m) — y(m — 1): 
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Assumption (iv): Person-specific linearity in m \ the person-specific effect of the mediator 


m(t) on Y is linear. That is, y(m(t)) = y(m = 0) + m(t) A. 


The combination of (ii), (iii), and (iv) implies that the person-specific effect of T on Y is 
linear in T : 


y(m(t)) = y(m(0) + tT) 

= y(m = 0) -1- m(0)A -1- tT A 


( 1 ) 


Thus, defining B as the person-specific effect of T on Y, we can relate the person-specific 
effects of T on M and of M on Y to the person-specific effect of T on Y by 

y(0-y(t-i) = B = rA. (2) 

The population average intent-to-treat effect (ITT) of interest here is £(B) = /?. The 
parameter /? is not directly observable, however, because it is the mean of differences in 
counterfactual outcomes. If we are justified in assuming that persons are assigned 
ignorably to treatments T — t for t G T, as would be true in a randomized experiment, we 
can estimate /? from sample data. 


Assumption (v): Ipnorable treatment assignment : T 1 Y(t), T 1 M(t), t G T. 


Likewise, assumption (v) enables us to estimate E(T) = y, the average causal effect of T on 
the mediator M (which we refer to as the "average compliance”) from sample data. 

Because instrumental variables methods rely on the instrument to induce some exogenous 
variation in the mediator (for at least some individuals), we require y to be non-zero: 
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Assumption (vi): Effectiveness of the instrument : y =£ 0. 


In the simple case in which we have a single instrument and a single mediator, the target of 
the instrumental variables estimator is the ratio of the intent-to-treat effect to the average 
compliance: 

P _ £|TA] _ yS + Cov(T,A) _ Cov(T,A) 
y £[T] y + Y 

(3) 

Equation (3) may be regarded as defining a "compliance-weighted average treatment 
effect" (CWATE) because each person’s treatment effect A is weighted by his or her 
compliance, I\ This is a rather unsatifying estimand, as we are typically interested in 
estimating 8, the average treatment effect, rather than a weighted average treatment effect, 
particularly where the weights are some unobservable and instrument-specific set of Vs 
(Heckman & Robb, 1985a, 1986; Heckman, Urzua, & Vytlacil, 2006). 

There are two different solutions to this problem that yield a well-defined estimand. 
First, we can simply assume 

Assumption (vii a): no person-specific compliance-effect covariance: Cov(T, A) = 0, 

in which case (3) identifies the population average treatment effect (ATE) as 8 = f/y. 
However, this assumption may be implausibly strong in some applications. The assumption 
says literally that the person-specific impact of M on Y is uncorrelated with that person’s 
inclination to comply. However, if persons have some knowledge of how well they will 
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respond to M, they may select a level of compliance accordingly. For example, a person 
who correctly expects A to be large will be motivated to seek a higher value of M; if 
assignment to treatment facilitates access to a higher value of M, such a person will comply 
more than will a person who correctly expects A to be zero. 

In the case where both T and M are binary, we can adopt an alternative assumption 
that may be more tenable than (vii.a). In this case, Angrist, Imbens and Rubin (1996) note 
that T can take on only three possible values: T = 1 for those for whom the instrument T 
determines their mediator value ("compilers”); T — 0 for those for whom the instrument 
does not affect the mediator ("always-takers" and "never-takers"); or T = — 1 for those who 
experience the opposite of the intended mediator condition ("defiers”). They then assume 
that there are no "defiers" in the population — no one for whom exposure to the instrument 
T causes them to switch from M = 1 to M = 0: 

Assumption ( vii.b >]: No defiers : T G (0,1). 

Under this assumption, we can simplify the expression for the CWATE in Equation (3) to 

P _ Pr(T = 1) ■ E[ T ■ A|T = 1] + Pr(T = 0) ■ E[T ■ A| T = 0] 

y ~ Pr(r = i) ■ E[r|r = 1] + Pr(r = o) ■ E[r|r = o] 

_ Pr(r = i) ■ p[a|t = l] + Pr(r = o) ■ o 

“ Pr(r = 1) ■ 1 + Pr(r = 0) ■ 0 

= £(A|r = i) 

= s c , 

( 4 ) 

where Pr(T = 1) is the proportion of compilers in the population. Angrist, Imbens, and 
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Rubin (1996) termed 8 C the "local average treatment effect" (LATE), also known as the 
average treatment effect on the compilers, the compiler average treatment effect (CATE) or 
the compiler average causal effect (CACE). Equation (4) shows that the LATE is a special 
case of the CWATE when both T and M are binary and the no defiers assumption holds. 2 

Summary of Single-Site, Single Mediator IV Assumptions 
Approaching the instrumental variable model from a potential outcomes framework 
is particularly useful when we allow mediator effects to be heterogeneous. After imposing 
assumptions (i)-(vi) (SUTVA, exclusion restriction, linearity, instrument effectiveness, and 
ignorable treatment assignment), this framework reveals the importance of either (vii.a), 
the no-compliance-effect-covariance assumption, or (vii.b) the no-defiers assumption. If 
both of these assumptions fail, the instrumental variable estimand is a compliance 
weighted average treatment effect (CWATE): those persons whose mediator is most 
affected by the instrument will be assigned the greatest weight in the estimand. 


3. THE IV MODEL WITH MULTIPLE SITES AND MULTIPLE MEDIATORS 
In the single-site, single mediator case, our challenge was to derive assumptions that 
define the ATE (5) or the LATE (5 C ) as a function of the average intent-to-treat effect /? and 
the average compliance y. We now consider the multi-site, multiple mediator case, where 
subjects within a multi-site trial are exposed to a treatment T, which may influence Y 

2 In some settings (e.g., Little & Yau, 1998), participants assigned to the control cannot gain access to the 
mediator, that is Pr(m(0) = 1) = 0. In this case, there are no "always-takers." We then see that LATE 
becomes the "treatment effect on the treated" (TOT), that is S c = fT(A|r = 1) = E(A\m = 1) = S T0T . 
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through P distinct mediators M lt M 2 , ... M P . We derive a set nine assumptions required to 
identify the effects of these mediators. The key insight that enables us to identify these 
effects is that site-specific values of /? become outcomes in a regression where multiple 
site-specific compliances are predictors. 

Six of our assumptions are straightforward extensions of the assumptions derived 
above in the single-site case, single-mediator case. These include SUTVA, the exclusion 
restriction, the two linearity assumptions, the assumption of ignorable assignment to T, 
and either a no compliance-effect covariance assumption (to identify ATE) or a "no defiers” 
assumption in the binary treatment, binary mediator case (to identify LATE). The 
assumption of non-zero average compliance that was needed in the single-site case is 
generalized to the assumption that there exists a full column rank site-by compliance 
matrix, literally a design matrix within a multiple regression framework. Standard 
requirements of regression then generate two additional assumptions: an assumption that 
one mediator does not affect another, and an assumption of independence among the site- 
level compliances and site level causal effects. These assumptions are described below. 

We first assume that both SUTVA assumptions hold (i.a and i.b) with respect to the 
vector of P mediators: 

Assumption ( i) : Stable unit treatment value assumptions (SUTVA) -. 

(i.a) Each unit i has one and only one potential value of the vector of mediators 
m t — (ra-i i, m 2 i , ... , m P J for each treatment condition t: in particular, for a 
population of size N, m* (iq, t 2 , ... , t N ) = m, (tj) for all i E (1,2, ... , N}. 

(i.b) Each unit i has one and only one potential outcome value of Y for each 
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treatment condition t and each vector of mediator values m^: in particular, 
for a population of size N, y^iy, t 2 , ... , t N , m 2 , ... , m w ) = y^tj, m^) for all 
i e {1,2 N}. 

We next assume that assignment to T influences Y only through the list of P distinct 
and observable mediators M lt M 2 , ... M P . Specifically, each participant has potential 
mediator values m 1 (t),m 2 (t), ...,m P (t) for t G T. The exclusion restriction now requires 
that T affects Y only through its effects on one or more of the mediators. That is: 

Assumption (ii): Exclusion restriction : The treatment T affects Y only through its impact on 
the set of P mediators, M = {M 1 ,M 2 , ...,M P }. That is, Y(t) — Y(t, m(t)) = F(m(t)). 

As above, we also assume person-specific linearity of each M in T (iii) and person-specific 
linearity of Y in each of the mediators (iv). Specifically, we assume that the outcome Y is a 
linear function of the mediators, and that there are no interactions among the mediators. 

Assumption (iii): Person-specific linearity of each mediator in T : The person-specific effect of 
T on each mediator M p is linear. That is, m p (t) — m p ( 0) + tT p for each p. 

Assumption (iv): Person-specific linearity ofY in M : The person-specific effect of each 
mediator M p on Y is linear. That is, T(m) = T(m = 0 ) + Hp=i^pA p . 

These imply, respectively, that the person-specific causal effect of T on M p is T p — m p (t) — 
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m p (t — 1), and that that the person-specific causal effect of M p on Y is A p = y(m p ) — 
y(m p — l), for all p E 1,2, ... , P. As above, the person-specific causal effect of T on Y is 
B = y(t) — y(t — 1). The observed outcome is y(t) = y(0) + tB. 

We next assume that assignment to T does not influence a given mediator M p 
through any other mediator M q . That is, the mediators do not influence one another. This 
is required so that the estimation of the effects of a given mediator M q on Y are not 
confounded with the effects of another mediator M p . 

Assumption (v): Parallel mediators : 

mp(t,m lf ... ,m p - 1 ,m p+1 , ... ,m P ) = m p (t ) for all p G 1,2,, ...,P. 

Together, the five assumptions above define the person-specific intent-to-treat effect as 
B = y(t) - y(t - 1) 

= y(m 1 (t),m 2 (t), ...,m P (t)) -y(m 1 (t- l),m 2 (t- 1), ...,m P (t- 1)) 

p 

-Zv, 

i 

( 5 ) 

Equation (5) says that the person-specific effect of T on Y can be written as the sum of the 
products of the person-specific effects of T on each mediator and the person-specific effects 
of that mediator on the Y (we discuss the implications of a failure of the parallel mediator 
assumption in Section IV below). Taking the expectation of (5) over the population within 
a site s yields 
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E(B\S = s)=p s = E 


p 

. 1 i J 

( 6 ) 

As in the single-site case, we shall need unbiased estimates of the average 
compliances and intent-to-treat effects within each site. Letting K denote the number of 
sites, we invoke 

Assumption (vi): Ignorable within-site treatment assignment : The assignment of the 
instrument T must be independent of the potential outcomes within each site: T 1 
T(t) |s, T 1 m(t) |s, V t G T, s G {1, ... , K}. 

As in the single-site case, it will next be useful to make either a set of set of no- 
compliance-effect covariance assumptions, analogous to (vii.a), or a set of "no defiers" 
assumptions analogous to (vii.b). The assumptions made here determine whether the 
model identifies the average treatment effect (ATE) or the complier average treatment 
effect (LATE). 

First, if we wish to identify the average treatment effects (ATEs) of the mediators, 
we may make the assumption that there is no within-site covariance between A p and T p for 
each mediator p: 

Assumption (vii.a]: No within-site compliance-effect covariance -. 

Cov s (T p ,A p ) — [Cov(r p ,A p )|S — s] — 0, for all p and s. 




5 = s 
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Alternatively, in the case where both T and M are binary and we wish to identify LATE, we 
invoke 

Assumption ( vii.b)\ No defiers : T p E {0,1} for all p. 

Either of these two assumptions, in combination with Assumptions (i-vi) generates 
a multiple regression problem in which an estimable site-average intent-to-treat effect /? s is 
the outcome and estimable site-average compliances y ps ,p — 1,2, ...,P are predictors. To 
see this, consider first the case of ATE where we invoke Assumption (vii.a) . Under this 
assumption, Equation (6) is 

p 

Ps = E ^A p L p 

- 1 

p p 

— ^ t fipsYps T ^ | Cov s (. Ap, Tp) 

i i 

p 

— ^ ’ fipsYpS 

1 

p p 

— ^ ' SpYps T ^ ' (^ps ^p)lps 

i i 

p 

— ^ t SpYps T <^ s , 

i 

(7) 

where 8 ps and y ps are the average effect of M p on Y in site s and the average effect of T on 
Mp in site s, respectively; where 8 p is the average, across sites, of the 8 ps ' s; and where the 
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error term is u) s = Y.i(S ps - 8 p )y ps . 

If, in contrast, we have a binary M and seek to estimate LATE, we invoke Assumption 
(vii.b), generating a multiple regression problem of exactly the same form. Specifically, we 
can write (6) as 

p 

Ps = E ^ A p r p S = s 

- 1 

p 

= E ^(Ap|r p = l)-Pr(r p = l) S = s 

- 1 

p 

= ^ £ [ A pl r p = 1 ’ s = s ] 'Yps 
i 

p 

— ^ ' 8 cps Yps 

i 

p p 

— ^ ' 8 cp Yps "b ^ ' (^cps — 8 Cp )Yps 

p= 1 p= 1 

p 

— ^ t 8 cp Yp S T m cs ' 
p = 1 

( 8 ) 

where 8 cps is the complier average effect of M p on Y in site s (the LATE for mediator p in 
site s); 8 cp is the complier average effect of M p on Y in the population; y ps is the average 
effect of T on M p in site s (which, under the no-defiers assumption, is equal to the 
proportion of the population in site s who are compliers with respect to mediator p); and 
co cs is an error term equal to Hp=i(5 C p S — 8 cp )y P s- 

Equations (7) and (8) use the same outcome /? s and the same predictors 
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Y ps ,p — 1,2 However, invoking the no-covariance assumption identifies the 
coefficients of this model as the ATEs 8 ps , p = 1,2, ... , P with random error co s in (7) , while 
invoking the no-defiers assumption identifies the coefficients of this model as 8 cps , p — 

1,2, ... , P and the errors as a) cs in (8). To identify either of these models thus requires 
additional standard assumptions for regression, namely that the design matrix be of full 
rank and that the model errors be ignorable. Thus, in either case, we assume 

Assumption ( viii Site-bv-mediator compliance matrix has su fficient rank . In particular, if G 
is the K xP matrix of the y ps 's, we require rank{ G) = P. This implies three specific 
conditions: 

(viii.a) The compliance of at least P — 1 of the mediators varies across sites. That is, 
Har(y ps ) = 0, for at most one p 6 {1,2, ... , P}. 

(viii.b) There are at least as many sites as mediators: P < K. 

(viii.c) There is some subset of Q site-specific compliance vectors, 

y s = {y ls , y 2s , ... , Yp s ], where K > Q > P , that are linearly independent. 

The sufficient rank assumption is a generalization of the familiar instrument effectiveness 
assumption (Assumption (vi) in the first section). Note that when there is a single 
mediator (P = 1), the site-by-mediator compliance matrix will have rank 1 so long as 
y ls =£ 0 for at least one site s (the average compliance across sites may be zero, as long as it 
is not zero in every site). Thus, when there is a single site and a single mediator, the 
sufficient rank assumption is identical to the usual condition that the treatment has a non- 
zero average impact on the mediator. 
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Our final assumption requires that the error term u> s of Equation (7) or co cs of (8) be 
ignorable. In order to identify the ATEs, we assume 

Assumption ( ix.a ): Between-site compliance-effect independence : The site average 
compliance of each mediator is independent of the site average effect of each mediator. 

That is E[8 qs \y ls ,y 2s , ■■■,Yp s ] = = S q for all q G 1 

Likewise, to identify the LATEs, we assume 

Assumption ( ix.b ): Between-site compliance-effect independence : The site average 
compliance of each mediator is independent of the site complier average effect of each 
mediator. Thatis E[8 cqs \y ls ,y 2s> ...,y Ps ] = E[8 cqs ] = S cq . 

Under Assumption (ix.a), we can write the expected value of the error co s in (7) as 

p 

E[u s \Yis>Y2S' ■■■ >Yps ] — E ^ fiq^Yqs YlS’Y2s> —>YPs ~ 0 

.<? = ! 

P 

— ^ ' Yqs ' E[(8 qs 8 q ^ \Yis> Y 2s> ■■■ > Yps\ 

q = i 

p 

= ^ Yqs ' £[(<5qs — <5q)] 
q= i 

= 0 . 

( 9 ) 
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By the same logic, Assumption (ix.b) implies that the expected value of the error term co cs 
in (8) is zero. 

Note that Assumptions (ix.a) and (ix.b) are each stronger than an assumption of no 
between-site compliance-effect covariance (the latter requires only no linear association 
between compliance and effect; the former requires no association whatsoever). 

Moreover, note that Assumptions (ix.a) and (ix.b) require not only that there be no 
compliance-effect association for a given mediator, but also that there be no cross-mediator 
compliance-effect association. That is, the site-average effect of T on a given mediator M q 
cannot be correlated with the site average effect of any mediator M p on Y. 


4. DISCUSSION 

Summary of Multiple-Site, Multiple-Mediator IV Assumptions 
To summarize, in the case of a multi-site study in which a treatment T may affect the 
outcome Y through multiple mediators, we require a number of assumptions in order to 
identify the average causal effects of the mediators using MSSM-IV methods. In order to 
identify the average treatment effect in the population, the relevant assumptions are 

(i) Stable unit treatment value assumptions 

(ii) Exclusion restriction 

(iii) Person-specific linearity of the mediators with respect to the treatment 

(iv) Person-specific linearity of the outcome with respect to the mediators 

(v) Parallel mediators 

(vi) Within-site ignorable treatment assignment 


19 



(vii.a) Zero within-site compliance-effect covariance for each mediator 
(viii) Compliance matrix has sufficient rank 

(ix.a) Between-site cross-mediator compliance-effect independence 

In order to identify the compiler average treatment effect (LATE) in the case of a binary 
treatment and binary mediators, assumption (vii.a) is replaced by assumption (vii.b), no 
defiers for any mediator; and assumption (ix.a) is replace by (ix.b), between-site 
independence of the compliance and compiler average effects. 

Note that six of these assumptions — SUTVA, the exclusion restriction, the two 
linearity assumptions, ignorable treatment assignment, and either the zero within-site 
compliance-effect covariance assumption or the no defiers assumption — are identical to 
those required for the single-site, single-instrument, single-mediator case (though often the 
two linearity assumptions are ignored because they are met trivially when the instrument 
and mediators are binary). Assumptions (v), (viii), and (ix) are specific to the multiple-site, 
multiple-mediator case (though the sufficient rank assumption (viii) is equivalent to the 
instrument effectiveness assumption when there is a single site and single mediator, as we 
note above). We discuss these three assumptions in more detail below. 

The Parallel Mediators Assumption 

The assumption that the mediators impact an outcome in parallel is a non-trivial 
assumption (see Appendix A for a detailed discussion). Consider the Duncan, Morris, and 
Rodrigues (2011) study described above. In this study, sixteen implementations of 
random-assignment welfare-to-work experiments were used to estimate the impact of 
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three hypothesized mediators of the programs: income, hours worked, and welfare receipt. 
The multiple-site, multiple mediator IV models used assume that none of these mediators 
affects the others. However, this is an implausible assumption, given that both hours 
worked and welfare receipt are clearly linked to income. 

The MTO study analyzed in Kling, Liebman, and Katz (2007) provides an 
opportunity to consider the parallel mediators assumption in concrete terms. In this study, 
random assignment to a voucher was hypothesized to affect outcomes via two potential 
mediators — use of the voucher and neighborhood poverty. Because neighborhood poverty 
could not be influenced except through use of the voucher, the implied structural model is 
that shown in Figure 2. 

Figure 2: 



In this model, treatment assignment affects neighborhood poverty ( NP ) only through use 
of a voucher (V). Both NP and V may then affect an outcome Y. As detailed in Appendix A, 
identification of S 2 — E[A 2 ] requires two key sets of additional assumptions. First, within 
each MTO site s, both a family’s likelihood of using the voucher if offered it and the change 
in neighborhood poverty experienced by a family if they use the voucher are uncorrelated 
with the effect of neighborhood poverty on that family. Families for whom a move to low- 
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poverty neighborhoods would be particularly beneficial are no more likely to use the 
voucher and move to low-poverty neighborhoods than are families for whom such a move 
would be less beneficial. Second, across MTO sites, there are no correlations between a) 
the average impact of neighborhood poverty and average voucher take-up rate; b) the 
average impact of neighborhood poverty and the average impact of voucher use on 
neighborhood poverty rates; c) the average impact of using of a voucher and the average 
voucher take-up rate; or d) the average impact of using of a voucher and the average 
impact of voucher use on neighborhood poverty rates. If, for example, sites where the use 
of a voucher had a large impact on neighborhood poverty (because it was relatively easy 
for families to move far from their original neighborhood) were also sites where use of a 
voucher moved families far from family and friendship networks that have a positive effect 
on outcomes, then the assumption of the independence of the direct effect of the voucher 
(through network supports in this example) and the effect of one mediator on another 
would be violated. Note that, in the MTO example, it would be possible to identify the total 
effect of the first mediator (use of the voucher), because there is no pathway from T to Y 
that does not go through V. Identifying the effect of NP and the direct effect of V on Y, 
however, requires additional assumptions about the independence of these effects and the 
effect of V on NP. Given the correlation of neighborhood poverty and other factors likely to 
influence the outcomes of interest in the MTO study, such assumptions may not be 
warranted. 


The Site-Average Compliance-Effect Independence Assumption 
The assumption that the site-average compliances are independent of the site- 
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average effects is non-trivial. Because site-average compliance effects are not randomly 
assigned to sites, they may not be independent of the site-average mediator effects. 
Consider a simple example. Suppose we have a multi-site study of the impacts of welfare- 
to-work programs, as in Duncan, Morris, and Rodrigues (2011), where the programs are 
hypothesized to affect child outcomes by affecting mothers’ hours worked, income, and 
welfare receipt. Suppose that entry-level wages and the cost of living are higher in some 
sites than others. In this case, randomized assignment to a training program may induce a 
greater increase in hours worked and income (higher compliance) in high-wage sites than 
in low-wage sites (because the wage benefits of work are greater); however, the effect of 
increased income on child achievement may be lower in high-wage sites than in low-wage 
sites, because the cost of child care, preschool, and school quality is higher. Such a pattern 
would induce a negative correlation between the work and income effects of the program 
and the effects of income on children, violating the assumption of site-average compliance- 
effect independence. 

Although the compliance-effect independence assumption is not empirically 
verifiable, it may be falsifiable, given sufficient data. Equation (9) implies that, in a multi- 
site study with P mediators and in which each of the nine assumptions is met, a plot in 
(P -I- 1) -space of the site-average intent-to-treat effects (the /? s ’s) against the P site-average 
compliance effects (the y ps ’s) will display a pattern of points scattered (with 
heteroskedastic variance) around a hyperplane passing through the origin with partial 


slopes — = 8 p , for all p. A violation of the site-average compliance-effect independence 


assumption, however, implies that E (u> s \y v ... , y P ) =£ 0 for some value(s) of y v ... , y P . As a 
result, the surface described by E Q 6 S \y 1 , ... , y P ) will be nonlinear. With sufficient data (a 
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sufficient number of sites and sufficiently precise estimation of the /? s ’s and y s ’s for each 
site), one might have adequate statistical power to reliably detect such non-linearity, 
allowing one to reject the compliance-effect independence assumption. 

In Appendices B and C, we derive expressions for the bias in the 2SLS MSMM-IV 
estimator when the site-average compliance-effect independence assumption fails. 

The Sufficient Rank Assumption 

The sufficient rank assumption is relatively straightforward. In order to identify the 
effects of P mediators using an MSMM-IV model, we require at least as many sites as 
mediators; we require that the effect of treatment assignment on the mediators varies 
across sites (for at least P — 1 of the mediators); and we require that there are at least P 
sites among which these effects are linearly independent. In many practical applications, 
these assumptions are likely to be met. The average effect of treatment assignment on a 
mediator is likely to vary across sites for a variety of reasons, including differential 
implementation, heterogeneity of populations, and differences among sites in baseline 
conditions or capacity. Moreover, unless the mediators are conceptually very similar, the 
effects of treatment assignment on the mediators are unlikely to be perfectly collinear. 

Nonetheless, in practical applications, the effects of treatment assignment on the 
mediators are likely to be somewhat correlated (though not perfectly) across sites. This 
may occur because in sites where a treatment is well-implemented, the treatment may 
affect all mediators more than in sites where it is poorly implemented. Or it may occur 
because the mediators are correlated in the world, leading to a correlation of compliances. 
For example, because income is correlated with hours worked, sites in which a treatment — 
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such as a welfare-to-work experiment — induces large changes in hours worked will tend to 
also be sites in which the same treatment induces large changes in income. 

Although such correlations among the y s ’s do not pose an identification problem for 
the MSMM-IV model (we require no assumption regarding the independence of the site- 
average compliances), they may pose a problem for estimation. Because the identification 
of the effects of the mediators depends on the separability of the site-average compliances, 
statistical power will be greatest — all else being equal — when compliances are not 
positively correlated. 


5. CONCLUSION 

If each of the nine assumptions described above is met, the effects of each mediator 
are, in principle, identifiable from observed data. Such models provide a possible approach 
to estimating the effects of the mediators of treatment effects when such mediators cannot 
themselves be easily assigned at random. The assumptions necessary for consistent 
identification in MSMM-IV models are not, however, trivial. In addition to the usual IV 
assumptions, such models require several additional assumptions. The parallel mediator 
and site-average compliance-effect independence assumptions, in particular, are relatively 
strong, and cannot be empirically verified (though with large samples the compliance- 
effect independence assumption may be falsifiable). Justification of such models must rely, 
therefore, on sufficiently strong theory or prior evidence to warrant these assumptions. 

Although we have framed our discussion in the context of a multi-site randomized 
trial, where 'sites’ are specific locations (different cities in the MTO example, different 
studies and cities in the welfare-to-work example), the same logic would apply to any study 
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in which randomization occurs within identifiable subgroups of individuals. Thus, one 
could stratify the sample of a large randomized trial by sex, age, and race, and treat each 
sex-by-age-by-race cell as a 'site' in order to create multiple 'site’-by-treatment interactions 
as instruments. This would, in principle, allow one to identify the effects of multiple 
mediators within a single (large) randomized trial, but only under the set of assumptions 
we describe above. Alternately, one could estimate a set of propensity scores, indicating 
each individual’s 'propensity to comply’ with each mediator, and then stratify the sample 
by vectors of these propensity scores. Using such strata as 'sites’ in an MSMM-IV model 
would have two advantages: it would ensure there is no or little within-site compliance- 
effect covariance (because compliance would be near constant within compliance strata); 
and it may allow one to create strata among which the site-average compliances are 
uncorrelated, which may increase the precision of the estimates. Estimating 'propensity to 
comply,’ however, is itself a non-trivial enterprise, relying on an additional set of rather 
strong assumptions (which we do not address here). 

Several important issues remain to be addressed in order to fully understand the 
use of MSMM-IV models. First, although failure of the assumptions will lead to inconsistent 
estimates, it is not clear how severe the bias resulting from plausible failures of the parallel 
mediators and compliance-effect independence assumptions will be. Second, we have not 
discussed the properties of specific estimators of MSMM-IV models or the computation of 
standard errors from such models. Both issues merit further investigation. 

Finally, although the nine assumptions we outline above ensure the consistent 
estimation of the effects of multiple mediators, they do not ensure unbiased estimation in 
finite samples. In single-site single-mediator instrumental variables models, finite sample 
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bias is a concern when the average compliance is small relative to its sampling variance. In 
multiple-site, multiple-mediator models, finite sample bias is more complex. In general, 
however, finite sample bias is likely to be a concern when both the average compliance 
(across sites) is small and the variance of the site-average compliances is small, relative to 
the sampling variation of the site average compliances. A full discussion of finite sample 
bias is beyond the scope of this paper, however. 
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APPENDIX A: RATIONALE FOR THE PARALLEL MEDIATORS ASSUMPTION 


For illustration, consider a simple case in which a treatment T affects Y through two 
mediators, Ml and M2, one of which affects the other, as illustrated in Figure A1 below. 

Figure Al: 



Let r x and T 2 be the person-specific effects of T on Ml and M2, respectively. Note that 

r 2 = r^ + r 1 A 12 . (Ai) 

where T 2 is the direct effect of T on M2 (the effect not mediated by Ml), and A 12 is the 
effect of Ml on M2. Likewise, let A 1 and A 2 be the effects of Ml and M2 on Y, respectively. 
Note that 

Ai — Ai + A 12 A 2 , (A2) 

where A) is the direct effect of Ml on Y (the effect not mediated by M2). 

Now, the person-specific effect of T on Y is given by 

B = + r 2 A 2 . (A3) 

Typically, we want to estimate = E[ A x ] and S 2 — E[ A 2 ]. Given a multi-site trial, within 
each site s, we have 

Ps = E[B\s] = E[T 1 A* 1 \s] + E[r 2 A 2 \s] 

= SisYis + 8 2 sY2s + Co^Cr^A)) -I- Cou s (r 2 ,A 2 ). 
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(A4) 


Let us assume Cou s (r 1; Ai) = 0 and Cou s (r 2 , A 2 ) = 0. The first of these says that the 
person-specific compliance of Ml is uncorrelated with the direct effect of Ml on Y. The 
second can be written as 

Coi7 s (r 2f A 2 ) — Cov s (r 2 + r 1 A 12 ,A 2 ) 

— Cov s (T 2 , A 2 ) + Cov s (r 1 A 12 ,A 2 ) 

= Cov s (r 2 , A 2 ) +y ls Cov s ( A 12 ,A 2 ) + A 12s Cov s (r lf A 2 ) 

+ ^[(ri — Yls)(Al2 — ^12s)(^2 — ^2s)l^ = s ] 

= 0 . 


(A5) 


This says that the person-specific effect of M2 cannot be correlated with any of the paths 
leading to it (and that the third centered moment of (r), A 12 , A 2 } must be zero, a condition 
that is met if the three terms are linearly related to one another and if each of them has a 
non-skew distribution). Thus, if the mediators are not parallel, then assumption (vii.a) 
must be expanded to include the assumption that, within sites, the direct effect of any 
mediator cannot be correlated with any upstream pathway leading from the treatment to 
that mediator. 

Given this assumption, we have 

Ps — 5 isYis T 3 2 sY 2 s 

= SiYis + 8 2 y 2s + ui s , 


(A6) 

where co s — (5( s — 5()y ls + (5 2s — S 2 )y 2s . As above, we require the assumption that this 
error term be independent of y ls and y 2s : 
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EMYis'Yzs] = Yis ■ EK8i s ~ 5i*)IXi s ,72s] + Yzs ’ E[(S 2s - 5 2 )|y ls ,y 2s ] = 0. 


A necessary, but not sufficient, condition for this to be true is that 

Cov(5* 1s ,y 1s ) = 0; 
Cov(8 2s ,y ls ) = 0; 
Cov(8* 1s ,y 2 s) = 0; 
Cov(8 2s ,Y2s ) = 0 . 


(A7) 


(A8) 

The first two of these expressions indicate that the site-average compliance of mediator 1 is 
uncorrelated with the site average direct effects of both mediators 1 and 2. The third and 
fourth expressions can be written as 

Cov(S* 1s ,y 2s ) = Cov{8* 1s ,y* 2s + y ls A 12s ) 

= Cov(8* 1s ,Y2s) + YiCov(8! s ,A 12s ) + A 12 Cov(81 s ,y 1s ) 

— ^i)(/is — Ti)(7-i2s — ^ 12 )]' 


and 


Cov(8 2s ,y 2s ) = Cov(8 2s ,y 2s + y ls A 12s ) 

= Cov(8 2s ,y 2s ) + Y!Cov( 8 2s , ^12s) + ^12 Cov(8 2s ,y 1s ) 

+ E[(8 2 s ~ 8 2 ~) (yi5 — 7i)(^i2s — ^ 12 )] ■ 

(A9) 

Thus, we require that the site-average direct effects of each mediator be independent of the 
site-average compliance of each mediator and independent of the site-average effect of 
mediator 1 on mediator 2 (and that the third centered moment of {; y ls , A 12s , 5 2s ) must be 
zero). In particular, we require Cov(8l s ,A 12s ) = Cov(8 2s ,A 12s ) = 0. 
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Given these assumptions, and the ignorable treatment assignment and sufficient 
rank assumptions (assumptions viii and ix), we can identify and S 2 from the regression 
model 

Ps = $lYlS + S 2Y2s + m s , (A10) 

because the /? s ’s, y ls ’s, and y 2s 's are directly estimable from the observed data. 

Importantly, however, the assumptions are not sufficient to identify 8 1 , the total effect of 
Ml. Our assumptions imply that + A 12 8 2 , but because our assumptions are, in 

general, insufficient to identify A 12 , we therefore cannot identify To identify X 12> we 
would require a further assumption regarding the independence of y ls and y 2s . 3 In general 
then, if we replace the parallel mediators assumption with a stronger set of assumptions 
about the independence of the person-specific and site-specific direct effects of each 
mediator with everything upstream from that mediator, we still can only identify the direct 
effect of each mediator (that part of the effect that does not operate through any other 
mediator in the model). 


3 To see this, consider the lefthand part of Figure At. If we consider M2 as the outcome, then T affects M2 
both directly and through Ml. Now construct a second mediator M* that is in the direct pathway between T 
and M2. Let M* — T for all individuals, implying that T*, the person-specific effect of T on M*, is equal to 1 for 
all individuals, and that A*, the person-specific effect of M* on M2 is equal to V* 2 for all individuals. Now we 
have a case of parallel mediators — T affects M2 through two parallel mediators Ml and M*. Assumption (vii) 
implies that y ls is independent of 8 $, but this is the same as assuming y ls ± y^s- Thus, to identify A 1Z , we 
require the additional assumption that the direct effects of T on both mediators are independent. Note that 
nowhere else have we assumed that compliances are uncorrelated; this is a strong, and generally untenable, 
assumption. 
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APPENDIX B: MEAN AND VARIANCE OF MSMM-IV ESTIMATORS 


In this appendix, we first show that using two-stage least squares (2SLS) to estimate 
the MSMM-IV model with site fixed effects and site-by-treatment interactions as 
instruments is equivalent to fitting a site-level weighted least squares regression model 
where the estimated ITT effect in a given site is the outcome and where the site-specific 
first-stage effects (the 'compliances’) are predictors. We next show that this estimator is 
unbiased under the assumptions we outline in the paper. Finally, we derive expressions for 
the sampling variance of the 2SLS MSMM-IV model under conditions of both homogeneity 
and heterogeneity of the mediator effects. 

Notation 

We have persons i = 1, ... , n s nested within sites s = 1, ... , K. Let N — 

Y.s=i n s ■ Person i in site s is assigned to treatment condition T is (which is measured in an 
interval-scaled metric) and is observed to have a vector of P continuous mediators 

— (Mi*, M 2 i S , — , M Pis y , and outcome Y is . Under the nine assumptions outlined above, 
T is is an instrument that identifies the vector of effects 8 = (5 1; S 2 , ■ ■ , 8 P )' of mediators 
M 1 ,M 2 , on Y. 

Let Y be the JVx 1 vector of observed outcomes. Let 1 be the N x 1 vector with 
elements equal to unity and let rj be a scalar. Let M be the JVxP matrix of observed 
mediators. Let T be the N x N matrix with the values of T on the diagonals. Finally, let S be 
the N xK matrix with element s is — 1 if person i is in site k and s ik — 0 otherwise. 
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The Two-stage Least Squares Estimator 


The 2SLS model is 


Y= Sti + £(M|T)S + u, u~(0,o£l) f 

(Bl) 

where r| is a K x 1 vector of site-specific intercepts, and where the conventional "first stage" 
gives us ^(MlT) = Sp + TSy, where p is the K x P matrix of site-specific intercepts from 
the first stage equations and y is the K x P matrix of compliance parameters from the first 
stage equations (i.e., y sp is the average effect of T on M p in site s). 4 Thus, (Bl) is equivalent 
to 


Y = Sri + SpS + TSy8 -1- u. 

(B2) 

The fixed effects estimator of 8 can be obtained by centering the elements of (B2) around 
their site means, yielding 

Y* = T*Sy8 + u*, 


(B3) 

where Y* is the JVx 1 vector with elements Y t * s = Y is — Y. s ; T* is the JVxJV matrix with 
diagonal elements T* s = T is — T. s ; and u* is the JVx 1 vector with elements u* is — u is — u. s . 
Now, the OLS estimator for (B3) will be 

S = (y'S'T*'T*Sy) -1 (y , S'T* , Y*). 


(B4) 


4 Note that (Bl) assumes that errors are i.i.d.; this is a standard (though potentially problematic) assumption 
in 2SLS models. In particular, if the 8 ps ’s vary across sites, the i.i.d. assumption is likely to be invalid. Note 
that the i.i.d. assumption is an assumption of a specific IV estimator, rather than an identifying assumption of 
the MSMM-IV method in general. 
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Now we note that W = S'T*'T*S will be the diagonal K x K weight matrix with diagonal 
elements equal to w s = n s cr| s , where er| s is the variance of T in site s. So we have 


8 = (y'Wy)- 1 (Y'S'T*'Y*). 

(B5) 

Now note that if we fit the reduced form fixed effects model Y = S0 + TSp, where 0 is a 
K x 1 vector of site-specific intercepts and (3 is the K x 1 vector of ITT effects (i.e., /? s is the 
average effect of T on Y in site s), using the same centering strategy as above, we get 

P = W -1 (S'T*'Y*) 

Wp = S'T*'Y*. 


Substituting (B6) into (B5) yields 

8 = (y'Wy)- 1 (y'wp). 


(B6) 


(B7) 


We can therefore reformulate the 2SLS regression model in (B2) as a site-level weighted 
least squares regression of the estimated ITT effects on the site-specific compliances, 
where the weights are w s : 

P = y8 + a), o)~(0, er 2 W -1 ). 


(B8) 

Equation (B8) implicitly assumes that 8 is homogenous across sites. More generally, 
however, the effect of the mediators may vary among sites. In order to compute the bias 
and variance of the MSMM-IV 2SLS estimator, we consider the general case where the 
effect of each mediator may be heterogenous. First we define y s as the s th row of y (that is 
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y s is the 1 x P vector of compliances in site s) and we define 8 S is the P x 1 vector of effects 
of the mediators in site s. Note that we can write the estimated ITT effect in site s as 


Ps = £(B|s) + e s 

p 

= ^£(T r ■ A r |s) + e s 

r= 1 

P 

^ ' Yrs^rs T cov(Y r ,A r )\s + e s 
r= 1 

P 

\Vrs d” YrsC^rs &r) d" COv(r r , A r ) |s] + £? s 

r= 1 

= YsS d- Ys(8 s _ 8) d- C S + e s 
= YsS d- Ys^s d- C s d" e s> 


(B9) 

where e s — /? s — /? s ~IV(0, o 2 /w s ); b s = 8 S — 8 ~ N(0, t), t being a P x P covariance matrix, 
and where C s — Xr=i cov(T r , A r )|s. We can then write (B9) as 






0 

Y 2 

0 



+ 



(BIO) 


Or, more compactly, as 


P — y8 + Zb + C + e, 


(Bll) 
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Where C and e are the K xl vectors of the C s 's and e s ’s and Z is the diagonal matrix 
containing the y s vectors. Substituting (Bll) into (B7) yields 

8 = (y'Wy)- 1 (y'wp) 

= (y'Wy)-y W(y8 + Zb + C + e) 

= 8 + (y'Wy)“Y W(Zb + C + e). 

(B12) 

Bias of the 2SLS Estimator 

To find the bias in the 2SLS estimator, we take the conditional expectation of (B12), 
given y: 

E( S|y) = 8 + (y'Wy) _1 y'W[Z£’ (b|y) + E{ C|y) + £(e|y)]. 

(B13) 

Under the assumption of ignorable assignment of T, f:(e|y) — E(e) — 0. Under the no 
within-site compliance-effect covariance assumption, C = 0, so (C |y) = ^(C) = 0. 

Finally, under the between-site compliance-effect independence assumption, f;(b|y) = 

F'(b) = 0. Therefore, £'(8|y) = 8 and the estimator is unbiased. 

Variance of the 2SLS Estimator 

Noting that Uar(p) = ZrZ' + er 2 W -1 , we can write the variance of the 2SLS MSMM- 
IV estimator as 

Var(8|y) = (y'Wy) _1 y'W[V ar (BKyCY'Wy)- 1 

= (y , Wy)- 1 y , W[ZrZ']W'y(y'Wy)- 1 + ^(y'Wy) -1 . 

(B14) 
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Note that if the effects are homogenous, that is, if r = 0, then (B14) becomes simply 


Var(8|y) = er 2 (y'Wy) 1 . 


(B15) 


APPENDIX C: EXPRESSIONS FOR THE BETWEEN-SITE 
COMPLIANCE-EFFECT COVARIANCE BIAS WHEN P = 1 OR P = 2. 

Here we derive expressions for the bias due to non-independence of the site-specific 
compliance and effect parameters in the 2SLS MSMM-IV estimator when there are one or 
two mediators. In order to simplify these expressions somewhat, and express them in 
terms of the means, variances, and covariances of the site-specific compliance and effect 
parameters, we require several simplifying assumptions. 

First, we assume w s is constant across sites (sample sizes and treatment variance 
are constant across sites). We also assume there is no within-site compliance-effect 
covariance (i.e., covCF^A-^ls = cov(r 2 ,A 2 )|s = 0 for alls G 1, ...,/C). Next we assume that 
the y s ’s and 8 s 's are linearly related to one another (i.e., we allow them to be correlated, but 
constrain them to have a linear relationships such that there are constants a p and b b such 
that y ps = a + b8 ps + e ps , E(e ps \8 ps ) = E(e ps ) = 0, for all p G 1, ..., P). Finally, we assume 

that the y s ’s and 8 s 's have non-skew distributions (i.e, that Y,s=i (jps ~ Yp ) — (dps ~ 

8 p ) = 0). Under these assumptions, (B12) becomes 
£(5-5|y) = (y'y)“YZb. 

K 

= (y'y) _1 ^YsYs (S s -S) 

s= 1 
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K 

= (y'y)' 1 ^(y; - Y'KYs - Y)(S s - s) + (Y’Ys + YsY - y’yKSs - S) 

S = 1 


K 

= (y'y )* 1 ^(y; - y'MYs - Y)(Ss - S) 

S = 1 


+ (y'(Ys - Y) + (Ys - Y')Y + y'y) (8s - 8) , 


(Cl) 

where y is the 1 x P matrix containing the averages of the y ps 's across sites. Under the 
linearity and non-skew assumptions above, £s=i(Ys — y') (Ys — Y)(8 s — 8) = 0. Likewise, 
it is straightforward to show that Y,s=i YY (8 S — 5) = 0. After applying these assumptions, 
(Cl) is now 

K 

E{ 8 - S|y) = (y'y)* 1 Z<T(r, - Y) + (y( - Y')y)(S s - S). 

S=1 


(C2) 

Bias in the P = 1 Case 


When P — 1, (C2) becomes 

K 

£(5 - 8 | y ) = (y 1 + var(y ls )) 1 ^ 2y 1 (y ls - y 1 )(S ls - 8J 

s= 1 

_ 2 YiCov(y ls , Sis) 

Yi + var(y ls ) 


Note that (C3) can be rewritten as 


E(8-8|y) = 2 p yS a s 


my) 

CV(y) 2 + 1' 


(C3) 


(C4) 
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where p yS is the correlation between y 1 and 8p, o s is the standard deviation of 8 1 across 
sites; and CV ( y ) = y is the coefficient of variation of y. For given values of p yS and a s , the 

bias is maximized when CV (y) — 1. The bias decreases to 0 as CV(y ) => 1 and as 
CV ( y ) => oo. Thus, under some simplifying assumptions about the joint distribution of y 
and 8 (linear association, non-skew distributions), the asymptotic bias in the mulitiple-site 
IV estimator can be written as a relatively simple function of the variances and covariances 
of y and 8. It may be possible to bound the bias term using information about the plausible 
distributions of the y’s and 8's obtained from other analyses. 


Bias in the P = 2 Case 


When P — 2, (C2) becomes 

K 

E{ 8 - 8|v) = (v'y)- 1 £<rcy. - y) + (y; - y')y)(S s - S) 

S=1 


= Wy)- 1 


ly^ov^y^, 5 ls ) + y 1 cov(y 2s , 8 2s ) + y 2 cov(y ls , 8 2s )' 
.y 2 cov(y ls , 8 ls ) + 2 y 2 cov(y 2s , S 2s ) + 7 iCou(y 2s , 8 ls ). 


= K 2 


yl + var(y ls ) 
-Y1Y2 + cov(Yi s ,y 2s ) 


YiY 2 + C0V (Yis>Y2s) 
yl + var(y 2s ) 


ly^oviy^, 8 ls ) + yiCou(y 2s , 8 2s ) + y 2 cov(y ls , 8 2s )' 
-Y 2 cov(y ls ,8 ls ) + 2 y 2 cov(y 2s ,8 2s ) +y 1 cov(y 2s ,S ls ). 


1\ yl + var(y 2s ) 

D [~YiY 2 - cov(y ls ,y 2s ) 


~YiY 2 ~ cov(y ls ,y 2s ) 
yl + var(y ls ) 


2y x cov{y is , 5 ls ) + y x cov{y 2s , S 2s ) + y 2 cov(y ls , S 2s y 
.y 2 cov(y ls , S ls ) + 2 y 2 cov(y 2s ,8 2s ) + 7 iCou(y 2s , 5 ls )J ' 


(C5) 


where 
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2 

D = {yl + var(y ls ))(yf + var(y 2s )) - {yyy 2 + cov(y ls ,y 2s )) . 


(C6) 


After a bit more matrix algebra and rearrangement, we have 

E[ 8 1 -dily] = AnCov(y ls> 5 ls ) + A 12 Cov(y ls , 5 2s ) + A 21 Cov(y 2s , S ls ) + A 22 Cov(y 2s , 8 2s ) 
E[8 2 -S 2 \y] = B 11 Cov(y ls ,6 ls ') + B 12 Cov(y ls ,8 2s ) + B 21 Cov(y 2s ,8 ls ) + B 22 Cov(y 2s ,8 2s ), 


(C7) 


where 


Tin = p [Yi Yl + 2y 1 kar(y 2s ) - y 2 Cov(y ls ,y 2s )] 

- 1 . . 

A 22 = [ViYz + 7i Var(y 2s ) + 2y 2 Cov(y ls ,y 2s )] 

1 , 

^12 = ^[72 +Y2Var(y 2s )] 

- 1 . 

^21 = [7i 72 + 7iCov(y ls ,y 2s )] 

-1 . 

fin = -^-[7 i72 + 72^ar(7i s ) + 2y 1 Cov(y ls ,y 2s )] 

1 9 

B 2 2 = p [7 i 72 + 2y 2 Kar(y ls ) - y 1 Cov(y ls ,y 2s )] 

-1 . 

fii2 = -^-[7i72 +72Cov(y ls ,y 2s )] 

1 9 

fi2i = ^[7i +7i^ar(y ls )]. 

CCS) 


The key thing to note here is that the bias in depends not only on the covariance 
between y ls and 5 ls , but also on Cov(y ls , S 2s ), Cov(y 2s , 8 ls ), and Cov(y 2s , 8 2s ). Similarly, 
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the bias in S 2 depends not only on the covariance between y 2s and S 2s , but also on 
Cov(y ls , 8 2s ), Cov(y 2s , 8 ls ), and Cov(y ls , 8 ls ). Moreover, the biases are very complex 
functions of these covariances, so it will not be easy to predict their magnitude or direction 
in practical applications. 
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